Another set of conditions for Markov decision processes with average sample-path costs

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Markov decision processes with observation costs

A partially observable Markov decision process (POMDP) is a generalization of a Markov decision process in which observation of the process state can be imperfect and/or costly. Although it provides an elegant model for control and planning problems that include information-gathering actions, the best current algorithms for POMDPs are computationally infeasible for all but small problems. One a...

متن کامل

Learning Algorithms for Markov Decision Processes with Average Cost

This paper gives the first rigorous convergence analysis of analogs of Watkins’ Q-learning algorithm, applied to average cost control of finite-state Markov chains. We discuss two algorithms which may be viewed as stochastic approximation counterparts of two existing algorithms for recursively computing the value function of average cost problem the traditional relative value iteration algorith...

متن کامل

Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

Abstract. This work concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function `. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that, if the expected average reward associated to ...

متن کامل

Average-Reward Decentralized Markov Decision Processes

Formal analysis of decentralized decision making has become a thriving research area in recent years, producing a number of multi-agent extensions of Markov decision processes. While much of the work has focused on optimizing discounted cumulative reward, optimizing average reward is sometimes a more suitable criterion. We formalize a class of such problems and analyze its characteristics, show...

متن کامل

l AVERAGE COST SEMI - MARKOV DECISION PROCESSES

^ The Semi-Markov Decision model is considered under the criterion of long-run average cost. A new criterion, which for any policy considers the limit of the expected cost Incurred during the first n transitions divided by the expected length of the first n transitions, is considered. Conditions guaranteeing that an optimal stationary (nonrandomized) policy exist are then presented. It is also ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Mathematical Analysis and Applications

سال: 2006

ISSN: 0022-247X

DOI: 10.1016/j.jmaa.2006.02.050